fix(quickstart): make the one-command path actually reach first fork#243
Merged
Conversation
E2E'd `forkd quickstart` end-to-end in an isolated privileged container (host networking untouched). It crashed at six distinct points before a clean run — every one a wall a new user hits in their first minute: 1. **install-guest-kernel.sh hardcodes `sudo`** — absent in minimal environments (containers, slim images) where quickstart already runs as root. Added a `$(id -u)==0 ? "" : "sudo"` shim. 2. **`file`/`strings` assumed present** in the same script's post-install descriptor lines — guarded with `command -v`. 3. **`$USER` unbound under `set -u`** in host-tap.sh + netns-setup.sh (unset in `docker exec` / non-login shells). Fall back through `id -un`. 4. **unpack baked the packing host's absolute paths** into snapshot.json, so the first restore on any other machine failed with FC's "Failed to open snapshot file". `hub::unpack` now rewrites `vmstate`/`memory` to the extraction dir; main.rs re-runs the rewrite AFTER the staging→final `rename(2)` (v1 and v2 chain paths) since the in-unpack pass points at the soon-stale staging dir. Operates on `serde_json::Value` so unknown fields and volume paths survive. Two unit tests. 5. **build-rootfs.sh hardcodes `sudo`** at 18 sites — same shim. 6. **tarball installs can't find `scripts/` or `rootfs-init/`** — the `from-image` bake shells out to build-rootfs.sh, which finds the guest init+agent via `$(dirname $0)/../rootfs-init`. quickstart now stages the embedded scripts as `<base>/scripts/*` with the init files in the sibling `<base>/rootfs-init/` and points FORKD_SCRIPTS_DIR there. Without this the guest booted with no init → `Kernel panic … init /forkd-init.sh failed (error -2)` → FC exits → pause fails. Also: doctor::preflight() softens the three non-gating rows (kernel/tap/ docker) from ✗ to ⚠ — a red ✗ immediately followed by "quickstart sets this up" read as a contradiction. Clean run now: preflight 7✓ → consent heal → docker bake (5.3 s snapshot) → fork 4 children in 59 ms wall-clock, 4/4 alive. Transcript saved. Note: the hub-pull fallback path surfaced a separate, deeper portability bug (snapshots aren't relocatable across hosts — rootfs absolute path in the vmstate + rootfs not shipped in packs); filed as #242. quickstart's preferred local-bake path is unaffected and prints a warning before the hub fallback. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
Follows up on #240's open promise: "E2E on a live host: pending — will run before tagging the next release." Done — and it found six blocking bugs, all in the first-run path a new user crosses.
E2E ran inside a
--privileged --device /dev/kvmDocker container on the dev box, so host networking was untouched (tap/netns/iptables all live in the container's own netns — same isolation model FC's own devtool uses).Six bugs, six fixes
install-guest-kernel.shhardcodessudosudo: command not foundin minimal/container envs running as rootfile/stringsassumed presentfile: command not foundafter a successful kernel install$USERunbound underset -u(host-tap + netns-setup)USER: unbound variableindocker exec/non-login shellsFailed to open snapshot file: No such file or directoryon first restorebuild-rootfs.shhardcodessudo(18 sites)sudo: command not foundmid-bakescripts/+rootfs-init/Kernel panic … init /forkd-init.sh failed (error -2)→ FC exits → pause failsPlus a UX fix:
doctor::preflight()softens the three non-gating rows (kernel/tap/docker) from red ✗ to ⚠ — a ✗ immediately followed by "quickstart sets this up" read as a contradiction.Fixes 1–3, 5 use a uniform privilege shim (
$(id -u)==0 ? "" : "sudo"). Fix 4 rewritesvmstate/memorypaths inhub::unpack+ re-runs post-rename(v1 and v2 chain paths) viaserde_json::Valueso unknown fields survive — 2 unit tests. Fix 6 stages embedded scripts as<base>/scripts/*with init files in the sibling<base>/rootfs-init/.Clean E2E transcript (tail)
59 ms wall-clock for 4 children off a freshly-baked snapshot, 4/4 alive.
Scope note
The hub-pull fallback surfaced a separate, deeper bug — snapshots aren't portable across hosts (rootfs absolute path baked in the vmstate + rootfs not shipped in packs). Filed as #242. quickstart's preferred local-bake path is unaffected and already warns before the hub fallback.
Test plan
cargo clippy --all-targets -D warnings+cargo fmt --checkcleancargo test -p forkd-cli hub::12/12 (incl. 2 new path-rewrite tests)bash -non all 4 modified scripts🤖 Generated with Claude Code